Motion illusion-like patterns extracted from photo and art images using predictive deep neural networks

In our previous study, we successfully reproduced the illusory motion perceived in the rotating snakes illusion using deep neural networks incorporating predictive coding theory. In the present study, we further examined the properties of the network using a set of 1500 images, including ordinary static images of paintings and photographs and images of various types of motion illusions. Results showed that the networks clearly classified a group of illusory images and others and reproduced illusory motions against various types of illusions similar to human perception. Notably, the networks occasionally detected anomalous motion vectors, even in ordinally static images where humans were unable to perceive any illusory motion. Additionally, illusion-like designs with repeating patterns were generated using areas where anomalous vectors were detected, and psychophysical experiments were conducted, in which illusory motion perception in the generated designs was detected. The observed inaccuracy of the networks will provide useful information for further understanding information processing associated with human vision.

www.nature.com/scientificreports/ color illusions [22][23][24] , the flash-lag effect 22 , and gestalt closure 25 . DNNs allow changes to the structure of a given network and the weights of the connections, which represent alterations that cannot be applied to a living brain. Our research group has studied motion illusion by attempting to reproduce illusions using DNNs and comparing them with human perception. We previously focused on the relationship between the occurrence of an illusion and the predictive function of the brain 26,27 . In that study, we constructed a DNN model incorporating predictive coding theory 28 as a theoretical model of the cerebrum [29][30][31] and trained by first-person-viewed videos 27 . The DNN model predicted motion in the rotating snakes illusion to a degree similar to that of human perception, suggesting that the DNN model could be used as a tool for studying the subjective perception of motion illusion.
In the previous paper, we analyzed only the rotating snakes illusion as a representative example of motion illusion. In the present study, we analyzed a variety of motion illusions using the DNN model and attempted to generate predictive images using ordinary static image datasets that included photographs and paintings. Additionally, we conducted psychophysical tests on human subjects and compared predictions by the DNN model with the results from human perceptions.

Methods
Deep neural networks. The connection-weight model of a trained DNN (PredNet; written in Chainer) used in this study was identical to a 500K model described previously 27 . In brief, PredNet is a DNN that predicts future video frames from past time series of video frames. It outputs the predicted image from the convolutional LSTM (Long Short Term Memory) network and proceeds to train it so that the error (mean squared error) between the future real image and the prediction is reduced. The unique feature of this network is that the prediction error, rather than the real image itself, is the input data to the convolutional LSTM network. To train the DNN, we used a video from the First-person Social Interactions Dataset (http:// ai. stanf ord. edu/ ~alire za/ Disney/). The video contains footage of days in the life of eight subjects at the Disney World Resort in Orlando, Florida. The cameras were attached to hats worn by the walking subjects. In other words, PredNet is expected to learn about the spatial-temporal characteristics of the world from first-person information. The connectionweight model used here was obtained by training using 500,000 video frames.

Test images.
To test the prediction of the DNN model, we prepared five groups of test image stimuli: motion illusions (n = 300), modern art paintings (n = 300), classic art paintings (n = 300), movable objects of photo pictures (n = 300), and still objects of photo pictures (n = 300). The motion illusions were originally generated by Drs. Akiyoshi Kitaoka 32 (299 images) and Eiji Watanabe 33 (1 image). The images of art paintings were randomly collected from wikiart (https:// www. wikia rt. org) according to their classification. The images of photo pictures were collected at random by icrawler 34 , which is a framework of web crawlers (license = "noncommercial, modify, " and keywords = "car, " "building, " "cat, " etc.), followed by manual classification as "Movable objects" (animals, vehicles, etc.) or "Still objects" (buildings, mountains, etc.). Images were trimmed and scaled down, and the final size of all images was adjusted to 160 × 120 pixels (width × height) to adapt the training images. The five groups of test image stimuli (1500 images in total) were shared as a "Visual Illusions Dataset" 35 . Prediction. The DNN model predicted the 22nd image (P1 image) with reference to 21 consecutive images, which were 21 images copied from one test image. The network then predicted the 23rd image (P2 image) with reference to 22 consecutive images, using the P1 image as the 22nd image. The optical flow vectors between the P1 and P2 images were then calculated by the Lucas-Kanade 36 and Farneback 37 methods using a customized Python program (window size 50, quality level 0.3 for the Lucas-Kanade; window size 10, stride 5 and min_vec 0.01 for the Farneback). The details of the protocol are essentially the same as in the above two papers. In brief, the feature points are extracted sparsely in Lucas-Kanade method and densely in Farneback method. Then, the optical flow between the two images is calculated using the least-squares criterion, starting from the pixels of the feature points. Both methods assume that the flow is essentially constant in a local neighborhood of the pixel of the feature point. Refer to Fig. 4 as an example of the two analysis methods.
Psychophysical experiment. The visual stimuli used in the psychophysical experiment were created by removing them from a photo picture (image A) and a painting (image B) as shown in Fig. 4. The cropped image was duplicated 20 times and combined horizontally, and combined images were transformed into the circle by WarpPolar method which is the polar conversion function in OpenCV (v.4.2.0.32; https:// opencv. org). A white rectangular image was concatenated under the cropped image to reduce the effect of distortion caused by deformation. The width w of the white image is the same as the width x of the cropped image, and the height h is obtained from the following equation for the circumference of the circle with radius r = h + y/2, where y is the height of the cropped image: where n s is the number of the repetitions (20). The values of h are obtained by the following equation: Decimal points of the values were rounded down. The size of the cropped image A and B is 9 × 26 pixels and 8 × 18 pixels (width × height), and h is 15 pixels and 16 pixels, respectively. The image size output from the polar conversion function was set to 1024 × 1024 pixels, and output images were used for the psychophysical (1) n s x = 2πr = 2π h + y 2 www.nature.com/scientificreports/ experiment. For the test of the prediction of the DNN model, these images were further size converted to 120 × 120 pixels and then placed on the center of a white image with a size of 160 × 120 pixels. The psychophysical experiment was designed based on the method of Hisakata et al. 38 and conducted using a program written in Python using OpenGL (v.3.1.5; https:// pypi. org/ proje ct/ PyOpe nGL/). The subjects were the authors T.K. and E.W. plus three naïve subjects (n = 5; all healthy subjects with normal vision). The subjects were asked to answer whether they saw the stimuli rotated clockwise (CW) or counterclockwise (CCW) by keyboard input using a two-alternative judgment. The face of each subject was fixed at 50 cm from the screen, and only the right eye was used for viewing. A gazing point with a viewing angle of 1 • was established at the center of a white background, and the stimulus with an outer diameter of 7 • and an inner diameter of 1 • was presented at 12 • to the left of the center for 0.5 seconds. The subjects looked at the gazing point and viewed the stimulus with their peripheral vision. When they responded, the next stimulus was played, but the stimulus was designed so that there was a minimum of 1 second between the presentation of the previous stimulus and the playback of the next stimulus.
To quantitatively examine the illusory motion of the stimuli, we intentionally rotated the stimuli and determined the conditions under which the motion perception did not occur. We prepared two types of images: the original image and its left-right reversed version. This was done to counteract the perceived rotation-velocity bias. The intentional stimulus rotational velocities were set to a range of −2.1 to +2.1 • /s, and the velocity intervals were 0.3 • /s. This means that a total of 15 different intentional stimulus rotation velocities were used. To statistically analyze the responses, we presented the same condition 30 times with randomly varying stimulus types and rotation velocities. Since there were 2 types of images, 15 types of velocity, and 30 repetitions, each subject was presented with 900 stimuli in total. From the statistical data obtained by this procedure, we calculated the rotational velocity of the stimulus based on the same probability of receiving an answer that the stimulus was rotating in the CW and CCW directions. Figure 6 shows the raw data of the psychophysical experiment. The horizontal axis represents the velocity at which the stimulus was intentionally rotated (with CCW as the positive direction), and the vertical axis represents the probability of responding that each stimulus was rotated CCW. Each obtained psychometric curve was fitted using a cumulative Gaussian function to calculate the rotational velocity and the rotation-cancellation velocity when the probability was 0.5. The rotation-cancellation velocity is the velocity required to cancel the rotation of the presented image, and the direction of rotation due to the motion illusion of the image is the velocity multiplied by a minus. Therefore, we used the original and reversed stimuli to calculate the rotational velocity of the stimulus as follows: Ethics statement. The study protocol was performed according to the Declaration of Helsinki and was approved by the Ethics Committee of the National Institute for Physiological Sciences (permit No. 20A063). The psychological experiments were performed with informed consent of all subjects. Informed consent included permission to disclose the subject's initials.
Open-source software. All program codes (DNN, optical flow analysis, and psychophysical stimulus presentation software), trained models, and stimulus images were released as open-source software at the following website.  Figure 1 shows the examples of optical flow vectors detected in the images predicted by the model against the five stimulus groups using the Lucas-Kanade method. Although relatively large and/or well-aligned optical flow vectors were detected in the predicted images against motion illusions, relatively small optical flows were detected in the images predicted against other groups. The direction of the motion vector detected from the motion illusions agreed with the direction of the illusory motion perceived by humans. Notably, the directed optical flows were detected not only in the illusion of many colors, shapes, and gradients but also in the illusion of simple white triangles (Fig. 1, upper left). As a fundamental property of the methodology, the Lucas-Kanade method extracts objects with a characteristic shape from images as feature points and exploits them as the starting points of the optical flow. Therefore, it was not very meaningful that the eyes and hands were selected, given that they were extracted as feature points and used as the starting point for the optical flow (e.g., Mona Lisa and President Obama).

Results
For quantitative analysis, the frequency rates of the absolute values of the optical flow vectors detected from each image group were evaluated (Top two graphs in Fig. 2) and averages of the absolute values of the optical flow vectors for each image group were generated (Fig. 3). Top two graphs in Fig. 2 shows that there is a noticeable difference in the frequency distribution between the motion illusion images and the rest of the image groups. In particular, near the modes of the non-motion illusion groups, the frequency of motion illusions was much lower  www.nature.com/scientificreports/ than the other groups. The results were as follows: motion illusions, 0.71 ± 0.18 (arbitrary units; Lucas-Kanade) and 0.64 ± 0.034 (Farneback); modern art paintings, 0.052 ± 0.0029 and 0.24 ± 0.021; realistic art paintings, 0.036 ± 0.00088 and 0.092 ± 0.0014; movable object photographs, 0.035 ± 0.0013 and 0.11 ± 0.0020; and still object photographs, 0.037 ± 0.0013 and 0.11 ± 0.0026. These results indicate that larger optical flow vectors were detected in the motion illusion group relative to the other groups. This tendency did not change according to the use of either Lucas-Kanade or Farneback analyses. These findings suggested that the DNN model accurately classified a group of illusory images and others. However, as shown in Fig. 2, relatively large optical flow vectors were also detected in images from groups other than that including motion illusion, although the number of examples was small. To investigate the cause of such exceptionally large optical flow vectors, two images (one photograph and one painting), in which notably large optical flows were predicted, were identified, and the P1 and P2 images were compared in detail. The first    Figure 5 shows a plot of the brightness values, where an exceptionally large optical flow was detected. Comparing P1 (Fig. 5, green line) with P2 (Fig. 5, blue line) revealed a shift in the patterns of the two brightness distributions. These results indicate that the DNN model incorrectly predicted motion for static images that a human would not recognize as moving.
We then hypothesized that the patterns of exceptionally large optical flows detected in the photograph and the painting might exhibit characteristics of motion illusions. We focused the analysis on most of the motion illusions having a repeating structure. Therefore, the areas where the large optical flows were detected (Fig. 5, boxed regions) were excised and reassembled into circular repeating structures (Fig. 6), followed by the psychophysical experiments using five human subjects. After testing the effect of the reconstructed design on human perception, we found that these artificially created designs rendered a type of motion illusion (Fig. 6). The strength of the detected rotational velocity and relative relationship between images A and B differed among the five subjects  (Fig. 4) and luminance analysis (Fig. 5) against DNN predicted images was counterclockwise for both illusion-like designs. This estimated direction of rotation coincided with the direction of perceptual  www.nature.com/scientificreports/ rotation obtained in the psychological experiment for the illusion-like design derived from image A, but not for the illusion-like design derived from image B. This trend was the same for all subjects. Next, each illusion-like design was input into the DNN model, predictive images were generated, and optical flow analysis and luminance distribution analysis were performed (the second and fourth rows in Fig. 4). As a result, large optical flow and luminance shift in the illusion-like design derived from image A were observed, but not rotational motion in one direction. In the illusion-like design derived from image B, only small flows and luminance shifts were observed.

Discussion
In this study, we showed that the DNN model distinguished between an illusion group and other groups of ordinary photographs and paintings, although it occasionally predicted motion in some parts of the ordinary images (Fig. 4). Interestingly, we were able to create new motion illusions from the target portions of these images. It is possible that there existed small unit structures in the motion illusions and that a background involving normal scenery might suppress the occurrence of illusory motion when the unit structures exist alone. We previously suggested the existence of unit structures in our recent study on the rotating snakes illusion and the Fraser-Wilcox illusion 39 . The illusion-like designs shown in Fig. 6 were among the first illusions to be discovered with the aid of artificial intelligence. Other examples reported include illusion generators based on an evolutionary algorithm 40 , a generative adversarial network 41 , and a statistical model 42 . In any case, in order to generate illusions, a module that artificially models human vision is essential. It is speculated that the type of illusion that is created is influenced by the type of brain function modeled in the module. In the evolutionary algorithms, motion illusions were generated by the same connection-weight model as in this paper; in the generative adversarial network, color and contrast illusions were generated by CNNs trained on static images; and in statistical models, color illusions were generated by a patch likelihood estimation model trained on static natural images. Such a methodology would not only synthesize new visual illusions useful for vision researchers, but would also provide a new way to study the similarities and differences between artificial models and human visual perception.  www.nature.com/scientificreports/ Figure 6. The psychophysical experiments. The boxes in images A and B (Fig. 5) were excised to create two motion-illusion-like designs (ring-shaped designs inserted into the figure), followed by psychophysical experiments using five subjects. Each psychometric curve was fitted with the cumulative Gaussian function using the least-squares method. The probability of seeing counterclockwise (CCW) rotation was plotted against the rotational velocity. The Red and blue charts for each subject correspond to the original image and its leftright reversed version. The dots and the curves in the graph represent the raw probability data and best-fit curve for the stimulus, respectively. In the case of the design from image A, the subjects showed a highprobability of answering that it was rotating CCW, whereas for image B, subjects showed a high probability of answering that it was rotating CW. www.nature.com/scientificreports/ Many motion illusions present a "repetition" of unit structures. As noted, we presume that the presence of even one of these unit structures can potentially cause the perception of motion. However, no single unit structure alone can cause the perception of illusory motion, which suggests that local information might lead to the perception of motion only when it is combined with global information. Supporting evidence suggests that a wide range of brain regions, from V1 to MT+, are involved in the perception of motion illusions 9 , with higher brain regions (e.g., MT+) thought to integrate information from a broader perspective than V1. The DNN model was capable of detecting motion flow in the unit structure embedded in the photographs and the paintings that was not perceived by humans (the first and third rows in Fig. 4). For the illusion-like designs that repeated unit structures extracted from photographs or paintings, humans perceived motion, but not the direction of motion or the relative magnitude of motion predicted by the DNN model ( Fig. 6 and the second and fourth rows in Fig. 4). The discordance between the two could indicate that the DNN model is underdeveloped in its ability to integrate global information, which is thought to be performed in higher brain regions and other areas. For artificial perception to be useful for basic research of human perception, further studies are required.